Toward Selectivity Based Keyword Extraction for Croatian News
نویسندگان
چکیده
Our approach proposes a novel network measure the node selectivity for the task of keyword extraction. The node selectivity is defined as the average strength of the node. Firstly, we show that selectivitybased keyword extraction slightly outperforms the extraction based on the standard centrality measures: in-degree, out-degree, betweenness, and closeness. Furthermore, from the data set of Croatian news we extract keyword candidates and expand extracted nodes to word-tuples ranked with the highest in/out selectivity values. The obtained sets are evaluated on manually annotated keywords: for the set of extracted keyword candidates the average F1 score is 24.63%, and the average F2 score is 21.19%; for the exacted word-tuples candidates the average F1 score is 25.9% and the average F2 score is 24.47%. Selectivity-based extraction does not require linguistic knowledge while it is purely derived from statistical and structural information of the network.
منابع مشابه
Keyword extraction: a review of methods and approaches
Paper presents a survey of methods and approaches for keyword extraction task. In addition to the systematization of methods, the paper gathers a comprehensive review of existing research. Related work on keyword extraction is elaborated for supervised and unsupervised methods, with special emphasis on graphbased methods as well as Croatian keyword extraction. Selectivity-based keyword extracti...
متن کاملToward Network-based Keyword Extraction from Multitopic Web Documents
In this paper we analyse the selectivity measure calculated from the complex network in the task of the automatic keyword extraction. Texts, collected from different web sources (portals, forums), are represented as directed and weighted co-occurrence complex networks of words. Words are nodes and links are established between two nodes if they are directly co-occurring within the sentence. We ...
متن کاملToward Network-based Keyword Extraction from Multitopic Web Documents
In this paper we analyse the selectivity measure calculated from the complex network in the task of the automatic keyword extraction. Texts, collected from different web sources (portals, forums), are represented as directed and weighted co-occurrence complex networks of words. Words are nodes and links are established between two nodes if they are directly co-occurring within a sentence. We te...
متن کاملFeature Extraction and Clustering of Croatian News Sources
This paper presents the design of a system for feature extraction and classification of news articles from Croatian news sources. An overview of supervised and unsupervised text classification and clustering machine learning techniques is presented. The techniques described are those most widely used for text classification tasks. The paper discusses a number of issues particular to text classi...
متن کاملKeyword extraction of radio news using domain identification based on categories of an encyclopedia
In this paper, we propose a keyword extraction method for dictation of radio news which consists of several domains. In our method, newspaper articles which are automatically classi ed into suitable domains are used in order to calculate feature vectors. The feature vectors show term-domain interdependence and are used for selecting a suitable domain of each part of radio news.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1407.4723 شماره
صفحات -
تاریخ انتشار 2014